Relation Extraction for Semantic Intranet Annotations
نویسندگان
چکیده
We present an approach for ontology driven extraction of relations from texts aimed mainly to produce enriched semantic annotations for the Semantic Web. The approach exploits linguistic and empirical strategies, by means of a pipeline method involving processes such as a parser, part-of-speech tagger, named entity recognition system, and pattern-based classification, and resources including ontology, knowledge and lexical databases. A preliminary evaluation with 25 sentences showed that the use of knowledge intensive resources and strategies together with corpus-based techniques to process the input data allows identifying and discovering relevant relations between known and new entity pairs mentioned in the text. Besides semantic web annotations, the system can be used for other tasks, including ontology population, since it identifies new instantiations of existent relations and entities, and ontology learning, since it discovers new relations, which are not part of the ontology.
منابع مشابه
The Light-Weight Semantic Web: Integrating Information Extraction and Information Retrieval for Heterogeneous Environments
Today’s Web, large intranets and even the documents collected by a single user are enormous sources of distributed, heterogeneous information that cannot be easily mastered. Syntactical and semantical differences as well as missing semantic annotations make effective query evaluation on such corpora a hard task. The Semantic Web aims at providing a standard for semantic annotations, but has not...
متن کاملTowards a Wiki Interchange Format (WIF) Opening Semantic Wiki Content and Metadata
Wikis are increasingly being used in world-wide, intranet and even in personal settings. Unfortunately, current wikis are data islands: people can read and edit them, but machines can only send around text strings without structure. Wiki migration, publishing from one wiki to another one and free choice of syntax hold back broader wiki usage. We define a wiki interchange format (WIF) that allow...
متن کاملA hybrid approach for relation extraction aimed to semantic annotations
We present an approach for relation extraction from texts aimed to enrich the semantic annotations produced by a semantic web portal. The approach exploits linguistic and empirical strategies, by means of a pipeline method involving processes such as a parser, part-of-speech tagger, named entity recognition system, pattern-based classification and word sense disambiguation models, and resources...
متن کاملSemantic Web Technologies for Analysis of Transcriptome
The Acacia team studies knowledge management through the building of an organizational memory, that we propose to materialize an organizational memory through an “organizational semantic web” constituted of: • resources : they can be documents (in various formats such as XML, HTML, or even classic formats), but these resources can also correspond to people, services, software or programs, • ont...
متن کاملAnnotating Relation Mentions in Tabloid Press
This paper presents a new resource for the training and evaluation needed by relation extraction experiments. The corpus consists of annotations of mentions for three semantic relations: marriage, parent–child, siblings, selected from the domain of biographic facts about persons and their social relationships. The corpus contains more than one hundred news articles from Tabloid Press. In the cu...
متن کامل